Skip to content

Conversation

@Manishearth
Copy link
Member

@Manishearth Manishearth commented Oct 18, 2025

Fixes #6459, fixes #7026

This is an attempt to define calendars in terms of my position in #6970 (comment).

For each calendar, I have attempted to first give enough information to unambiguously identify the calendar. Typically, this means mentioning whether it is lunar or solar, talking a little bit about the leap situation, and if it is a civil or otherwise officially-used calendar, mentioning a country that it is the official calendar of in 2025.

For solar calendars, I have identified when the calendar was first introduced and expressed and explicitly called it out as being proleptic before that.

For lunar calendars, I have attempted to unambiguously identify the exact algorithm when there is one, and if there is not, I have defined it as the ground truth in a region for a given range of dates, and also specified what we do outside that range. There's some playing around we can do with what we guarantee vs what we do now. I have done an attempt to specify the ways in which these implementations may change in the future.

///
/// This calendar is intended to represent the traditional Chinese lunar calendar as used
/// officially in the People's Republic of China as of 2025. This takes a best-effort approach
/// to match past and future dates as used in the region for the year 1900 onwards.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: don't name a specific year, because then we need to debate "why 1900". Just say that it intends to match ground truth dates over an arbitrarily long range. Then in the next paragraph we can name 1900 as part of the data source.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is deliberate. I am defining the calendar as matching ground truth in a particular time period, as its core definition. And then on top of that we specify additional behaviors.

I could instead choose to define it as matching ground truth for a different time period. I would prefer not to define it as matching ground truth for an unspecified period, but we could do that too. This is the core definitional discussion we need to have.

I want these definitions to be useful, so actually giving a minimum that is the core definition is good. If you'd like I can change this number to 1950. I think we should have a number where "best effort" is correct.

Copy link
Member Author

@Manishearth Manishearth Oct 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining this as 1950 and potentially increasing it in the future is one easy 2.1 option.

The answer to "why 1900" is that the ground truth 1900 onwards is easily available in multiple sources so there is very little chance of needing to deal with discrepancies (which is the problem when attempting to define a calendar: you don't want to deal with "well these people handled the calendar this way, and these people handled it this way"). That does start to become a problem the further back you go.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, "why 1900" is "the ground truth 1900 onwards is easily available in multiple sources". That's not how we define the calendar, though. The definition is that it is a calendar that matches ground truth in China. The implementation constraint is that we are choosing to use the data source that has data for 1900 onwards.

Copy link
Member Author

@Manishearth Manishearth Oct 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fine, but this statement is serving a different purpose that is important to the definition.

Specifically, it is important to define it as something that we make a best effort attempt to match ground truth in 2026, and 2027, and so on, that does not change meaning as time moves on. This distinguishes it from a calendar that is attempting to be stable in behavior even when projected ground truth changes.

So I do not wish to say "for future dates", I wish to say "best effort attempt to match ground truth from $year onwards". I don't have a strong reason to care about what $year is, it should just not be something we need to update every year. It could be 1900, or 1950, or 2000, or 2025 (kept static). I think picking a year further in the past is nice because it gives people an idea of what they can consider stable, but we also have a separate section on stability anyway.

If you can offer wording for the definition that works for that, I can get rid of this bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shane convinced me to instead just say "best effort attempt for future dates".

/// # Precise definition and limits
///
/// This calendar is defined algorithmically as a solar calendar that has 13 months, with a leap day in
/// the 13th month every 4 years, as used by the Coptic orthodox church as of 2025. This calendar extends proleptically
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, here and elsewhere: rather than 2025, name the year the calendar was adopted in an official status

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a deliberate choice: it is a calendar disambiguator in case the church decides to change things around. It's supposed to be modern. I could say "in 2025" instead of "as of 2025" if that makes it clearer.

"What calendar did $entity use in 2025" is an easy thing to verify. "what calendar did $entity use in $far_past_year" is often hard, especially since identity is tricky. There have been many splits and rejoins of countries and churches and this avoids dealing with that problem when we are doing a baseline disambiguation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also avoids the problem of discovering later that the Coptic church actually tweaked the precise algorithm in 1000 AD or something, or potentially made a mixup with incorrect calculations (similar to the pre-4 AD Julian calendar). An algorithm tweak happened multiple times with Hebrew; I am not 100% convinced it has never happened with the other calendars.

ICU4X encodes useful calendars that are in modern use (plus Julian). When attempting to unambiguously identify a calendar, talking about its modern use is more important than talking about it in terms of when it was introduced, because what we care about is modern use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer saying "as currently used by the Coptic church". Otherwise next January we have to either update them all to 2026 or our library will look outdated

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, "as currently used by the Coptic church" means that this is defined as "whatever the church uses" and is prone to issues if the church ever decides to discard this calendar.

I could pick 2000 instead if people want a nice round number that is clear as an arbitrary number.

We should not be updating the date every year. This is a part of the disambiguation, "the calendar used by the Coptic church in 2025" is unambiguous and will remain so. "the calendar currently used by the Coptic church" does not have that property.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They've used this for almost 2000 years now, I would consider this very stable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should focus on who uses these calendars in 2025. The Coptic calendar was used by the Coptic church since its inception, that's its whole identity. Saying it's used by the Coptic Church "as of 2025" sounds like they keep changing calendars, which is not true. If you see this code in 2026, you have to wonder what calendar the Coptic church uses currently, which is not what this documentation should make you wonder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They've used this for almost 2000 years now, I would consider this very stable.

I could say "in at least the period XXX CE - 2025 CE" but that makes it sound like they stopped in 2025.

I do not wish to make sure claims about the future here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shane convinced me to say "as of the publication date of this crate".

@Manishearth Manishearth requested a review from sffc October 18, 2025 07:10
Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by the different levels of details for the "precise definition" of the different calendars. But then, if we link to the Wikipedia page of the calendar, do we also need to provide a "precise definition" if that's unambiguous (Gregorian et al)?

I also don't like classifying calendars as "algorithmic" and (implicitly) non-algorithmic. We don't provide a single non-algorithmic calendar (the only non-algorithmic calendar I'm aware of is an observational Hijri). We sometimes hardcode data because we don't implement the algorithms, but the calendars are still very much algorithmic.

/// # Precise definition and limits
///
/// This calendar is intended to represent the traditional Chinese lunar calendar as used
/// officially in the People's Republic of China as of 2025. This takes a best-effort approach
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's try to be generic over "China", this is also used in Taiwan

Suggested change
/// officially in the People's Republic of China as of 2025. This takes a best-effort approach
/// officially in China as of 2025. This takes a best-effort approach

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I chose a polity because I am trying to unambiguously indicate a particular calendar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "China" is better because the calendar is used throughout the region. If different sub-regions adopt different rules, then we might want to get more specific with the naming. And in that case, I might say "in Beijing" rather than "in China".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decided that saying "China" is fine and if the two polities every diverge here we have sufficient leeway to decide what to do then.

/// # Precise definition and limits
///
/// This calendar is intended to represent the traditional Chinese lunar calendar as used
/// officially in the People's Republic of China as of 2025. This takes a best-effort approach
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a "best-effort approach" from the year 1900 onwards. it's "matches ground truth" from the year 1900 onwards"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it doesn't. It is a best effort attempt to match ground truth from 1900 onwards. It succeeds for the years 1900-2025. We cannot make sure statements beyond that. I can frame that better, but then we get to the problem you are already complaining about where there is a year in the docs we need to keep updating.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on feedback from Shane, I've changed this to be "best effort for future dates"

Comment on lines 66 to 67
/// This calendar is defined algorithmically as a solar calendar with a leap month every 4 years, as used
/// by the Roman Empire since 1 CE.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"as used by the Roman Empire since 1 CE" is a weird thing to say. the Roman empire doesn't exist anymore, many different entities have used this calendar for centuries, but this is all already explained in previous paragraphs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this could be framed better.

///
/// # Precise definition and limits
///
/// This calendar is defined algorithmically as a solar calendar with a leap month every 4 years, as used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a very precise definition of the calendar

///
/// # Precise definition and limits
///
/// This calendar is defined as a solar calendar which uses the astronomical vernal equinox as its new year, and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how many months? how long?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to be that precise in each calendar? Expressing some characteristics and a polity where it is official is pretty unambiguous.

@Manishearth
Copy link
Member Author

I also don't like classifying calendars as "algorithmic" and (implicitly) non-algorithmic. We don't provide a single non-algorithmic calendar (the only non-algorithmic calendar I'm aware of is an observational Hijri). We sometimes hardcode data because we don't implement the algorithms, but the calendars are still very much algorithmic.

"algorithmic" is about the definition of the calendar. The Gregorian calendar is defined as a precise, computable algorithm. The United States deciding to add a day to a year does not change that.

This is not true for UAQ or Chinese, where while there is an algorithm we implement, we are also following ground truth.

But then, if we link to the Wikipedia page of the calendar, do we also need to provide a "precise definition" if that's unambiguous (Gregorian et al)?

Not necessarily, I was trying to be as redundant as possible. This did end up with different levels of detail. I don't think consistency across calendars is important here when it comes to level of detail.

Comment on lines +55 to +54
/// This calendar generically covers any pure lunar calendar used liturgically in Islam,
/// with 12 months each of length 29 or 30, with an epoch intended to mark the Hijrah in 622 CE.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: I'm fine saying that the epoch is the Hijrah, but I thought we previously said we needn't make that commitment, which is why we added ECMA reference year functions to the hijri::Rules trait.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think from a calendrical definition perspective saying the epoch is Hijrah is probably good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need that actually, because PackedHijriYearData needs it

@Manishearth Manishearth requested a review from sffc October 20, 2025 23:13
sffc
sffc previously approved these changes Oct 20, 2025
Copy link
Member

@robertbastian robertbastian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For most calendars, the new "Precise definition and limits" section just repeats things that are already in the documentation a few lines higher. I don't see the point of that. Some calendars might benefit from additional docs, but I think for most calendars, this PR actually makes the docs worse due to repetition.

/// The [`Rules`] used in China.
///
/// This type agrees with the official data published by the
/// # Precise definition and limits
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could put a "precise defintion" heading at the top of pretty much every doc comment. if it doesn't separate from some other heading it's pointless

///
/// # Precise definition and limits
///
/// This calendar is defined algorithmically as a solar calendar that is identical to the proleptic Gregorian calendar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same as the first paragraph

///
/// # Precise definition and limits
///
/// This calendar is defined algorithmically as a solar calendar that is identical to the proleptic Gregorian calendar in everything
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already said in the first two paragraphs

///
/// # Precise definition and limits
///
/// This calendar is defined algorithmically as a solar calendar with a leap month every 4 years, as was used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already explained in previous paragraphs, and this is not very exact

Copy link
Member Author

@Manishearth Manishearth Oct 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think specifying the year it was used in the Roman empire is quite exact, and the best we can do for a historical calendar. Do you have another suggestion?

@Manishearth
Copy link
Member Author

(last commit across push is "shane comments", which has been reviewed)

@Manishearth
Copy link
Member Author

I think for most calendars, this PR actually makes the docs worse due to repetition.

I think it's valuable to have a consistent section that pins down the various details about the calendar definition.

@Manishearth
Copy link
Member Author

So my goal with this section is for there to be a one stop shop for the calendar definition on each calendar. I'm happy to roll things from the main calendar docs into the definition (which is what I did for Chinese/Korean, and after review, UAQ).

For the simple calendars I think it's fine for the definition to be a bit repetitive. I don't think it makes it worse.

@robertbastian
Copy link
Member

I'm not aligned on that goal

@Manishearth
Copy link
Member Author

@robertbastian What would you like to see here, then? I'm trying to do what was asked of me here, since people felt that this work would unblock the discussions. I don't really wish to be stuck in the position of making the perfect docs here. I want to make an improvement.

I'm really not convinced that this is in any way worse than the status quo, either.

@robertbastian
Copy link
Member

I'm trying to do what was asked of me here

The issue asks to define validity ranges for each calendar. For the solar calendars, it should suffice to say "before x this is proleptic" (which Julian and Gregorian already do in much detail). For Hijri and LunarChinese we need more detail, but we already have precise documentation on the validity ranges (we didn't have this when #6459 was filed).

@Manishearth
Copy link
Member Author

For the solar calendars, it should suffice to say "before x this is proleptic" (which Julian and Gregorian already do in much detail).

Right, what is the problem with duplicating that text in the section for consistency?

I'd be happy to move that detail into this section if preferred.

I'm also not opposed to removing this section from the solar calendars, but at least for Buddhist this is an improvement.

@Manishearth
Copy link
Member Author

I feel like I've gotten mixed signals here where in some comments you've asked me to be more precise in the documentation on the simple calendars and other comments you've asked me to not have that section at all.

@robertbastian
Copy link
Member

If we need a "precise definition" section for each calendar, it should be a precise definition, with the same amount of precision for every calendar. That includes listing how long each month is, and where leap days or months are.

However, I don't think we need that section. For the Buddhist calendar the first paragraph should say that it's proleptic before 1941, this information should not be in the fine print.

I'm not a fan of listing who uses which calendar either, because as we have established, that is not stable.

@Manishearth
Copy link
Member Author

I'm not a fan of listing who uses which calendar either, because as we have established, that is not stable.

This is exactly why I am saying "as of the publication date".

I think there is an important purpose being served here. If, say, India tweaks the calendar in the future, in a way that is not the current algorithm, we are not automatically committed to supporting that tweak. Instead, we have identified the precise calendar by saying "it's the one used by India in $year" (which on Shane's suggestion I've tweaked to be publication date), and that documentation will be forever correct because the past is not malleable. When these governments make calendar changes we can add a new calendar code or something.

This is in contrast with UAQ and Chinese, where if the governments release new data we just adopt it, unless they radically change the calendar entirely.

@robertbastian
Copy link
Member

Instead, we have identified the precise calendar by saying "it's the one used by India in $year"

This is the whole calendar identity discussion. The Indian calendar is not the calendar used by India in 2025, it's the calendar described in https://en.wikipedia.org/wiki/Indian_national_calendar. If India makes major changes to the calendar they are using, it's not going to be the Indian National Calendar anymore. This argument is fairly weak for Indian I admit, but for Coptic it's not; the Coptic calendar has been used since the 3rd century, there is no authority that can or will change it. Even if both Coptic church agree now after 1800 years that they change calendar, that will be a new calendar.

@Manishearth
Copy link
Member Author

Manishearth commented Oct 22, 2025

This is the whole calendar identity discussion. The Indian calendar is not the calendar used by India in 2025, it's the calendar described in https://en.wikipedia.org/wiki/Indian_national_calendar

Well, yes. I was asked to make this PR so we could move forward on the calendar identity discussion. This is exactly why I think every calendar should have this section. If you want to debate the precise definition we use, I'm happy to do that, but then we should debate the identity. We can define the calendar either by a very precise algorithm (I think that's overkill), or as "as used by India in $year", or "as used by the government of india, whatever they may decide", or as "as described by this wikipedia page". We should pick that. By and large I am uncomfortable with treating Wikipedia as a source of truth of calendar definition.

I would prefer to not debate the existence of this section based on calendar identity questions just because removing the section would "default" to a particular identity that you prefer. The fact that we're having this discussion tells me that this section is necessary. Let's talk about what goes in it.

I do think thejustification for having this section doesn't hold as strongly Coptic. At that point it's just a consistency thing.

@Manishearth
Copy link
Member Author

#7151

robertbastian added a commit that referenced this pull request Oct 23, 2025
This is my attempt at #7123

#6459, fixes
#7026

* List the names of months and their lengths for every calendar
* Explain leap year rules for each calendar
* Document who created a calendar instead of who uses it, as that is
subject to change
* Document each calendar's inception date and that we implement it
proleptically before that
* Make clearer distinction between "this calendar" and "this
implementation"
* Add a section on calendar drift to most calendars (todo for Hebrew)

For both the Persian and the Ethiopian calendar I'm struggling to find
information about where they come from, they still need work.

This PR adds general docs to the `EastAsianTraditional` and `Hijri`
types, it does not yet document the various `Rules` implementations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Document calendar validity ranges Buddhist: should it have an April new year before 1941?

3 participants